7 research outputs found
The LDBC Graphalytics Benchmark
In this document, we describe LDBC Graphalytics, an industrial-grade
benchmark for graph analysis platforms. The main goal of Graphalytics is to
enable the fair and objective comparison of graph analysis platforms. Due to
the diversity of bottlenecks and performance issues such platforms need to
address, Graphalytics consists of a set of selected deterministic algorithms
for full-graph analysis, standard graph datasets, synthetic dataset generators,
and reference output for validation purposes. Its test harness produces deep
metrics that quantify multiple kinds of systems scalability, weak and strong,
and robustness, such as failures and performance variability. The benchmark
also balances comprehensiveness with runtime necessary to obtain the deep
metrics. The benchmark comes with open-source software for generating
performance data, for validating algorithm results, for monitoring and sharing
performance data, and for obtaining the final benchmark result as a standard
performance report
Exploring HPC and Big Data Convergence: A Graph Processing Study on Intel Knights Landing
The question 'Can big data and HPC infrastructure converge?' has important implications for many operators and clients of modern computing. However, answering it is challenging. The hardware is currently different, and fast evolving: big data uses machines with modest numbers of fat cores per socket, large caches, and much memory, whereas HPC uses machines with larger numbers of (thinner) cores, non-trivial NUMA architectures, and fast interconnects. In this work, we investigate the convergence of big data and HPC infrastructure for one of the most challenging application domains, the highly irregular graph processing. We contrast through a systematic, experimental study of over 300,000 core-hours the performance of a modern multicore, Intel Knights Landing (KNL) and of traditional big data hardware, in processing representative graph workloads using state-of-the-art graph analytics platforms. The experimental results indicate KNL is convergence-ready, performance-wise, but only after extensive and expert-level tuning of software and hardware parameters
Exploring HPC and Big Data Convergence: A Graph Processing Study on Intel Knights Landing
The question 'Can big data and HPC infrastructure converge?' has important implications for many operators and clients of modern computing. However, answering it is challenging. The hardware is currently different, and fast evolving: big data uses machines with modest numbers of fat cores per socket, large caches, and much memory, whereas HPC uses machines with larger numbers of (thinner) cores, non-trivial NUMA architectures, and fast interconnects. In this work, we investigate the convergence of big data and HPC infrastructure for one of the most challenging application domains, the highly irregular graph processing. We contrast through a systematic, experimental study of over 300,000 core-hours the performance of a modern multicore, Intel Knights Landing (KNL) and of traditional big data hardware, in processing representative graph workloads using state-of-the-art graph analytics platforms. The experimental results indicate KNL is convergence-ready, performance-wise, but only after extensive and expert-level tuning of software and hardware parameters.</p
Exploring HPC and Big Data Convergence: A Graph Processing Study on Intel Knights Landing
The question 'Can big data and HPC infrastructure converge?' has important implications for many operators and clients of modern computing. However, answering it is challenging. The hardware is currently different, and fast evolving: big data uses machines with modest numbers of fat cores per socket, large caches, and much memory, whereas HPC uses machines with larger numbers of (thinner) cores, non-trivial NUMA architectures, and fast interconnects. In this work, we investigate the convergence of big data and HPC infrastructure for one of the most challenging application domains, the highly irregular graph processing. We contrast through a systematic, experimental study of over 300,000 core-hours the performance of a modern multicore, Intel Knights Landing (KNL) and of traditional big data hardware, in processing representative graph workloads using state-of-the-art graph analytics platforms. The experimental results indicate KNL is convergence-ready, performance-wise, but only after extensive and expert-level tuning of software and hardware parameters.Accepted author manuscriptDistributed System
Graphless: Toward serverless graph processing
Our society is increasingly solving complex problems through the use of graph processing. Existing graph processing systems focus on performance, which allows addressing ever-larger and more complex problems. They also require uncommon expertise to properly deploy and utilize. To make graph processing generally accessible-to small and medium enterprises and institutions, to common research groups, to individuals-, in this work we design and implement the Graphless graph-processing system. Graphless is based on the serverless paradigm, which proposes to simplify computing by letting developers only focus on small, stateless functions, which are deployed and managed automatically. We address with Graphless the key challenge of combining the stateless functions assumed by serverless computing with the (opposite) data-intensive nature of graph processing. Graphless tackles this challenge through an architectural approach that allows it to deploy with push or with pull operation, and a collection of backend services, such as an orchestrator and a memory-as-a-service component. We implement Graphless and conduct with it real-world experiments using Amazon Lambda for cloud-based serverless resources. Using the LDBC Graphalytics benchmark, we analyze Graphless, and compare its performance and operational cost with the graph-processing systems Apache Giraph (big data domain) and GraphMat (HPC). Overall, we show evidence Graphless provides performance and cost-efficiency similar to Giraph, for algorithms that can benefit from fine-grained elasticity, and lower than GraphMat, but is architecturally easier to deploy, and provides both push and pull operation
The atlarge vision on the design of distributed systems and ecosystems
High-quality designs of distributed systems and services are essential for our digital economy and society. Threatening to slow down the stream of working designs, we identify the mounting pressure of scale and complexity of (eco-)systems, of ill-defined and wicked problems, and of unclear processes, methods, and tools. We envision design itself as a core research topic in distributed systems, to understand and improve the science and practice of distributed (eco-)system design. Toward this vision, we propose the AtLarge design framework, accompanied by a set of 8 core design principles. We also propose 10 key challenges, which we hope the community can address in the following 5 years. In our experience so far, the proposed framework and principles are practical, and lead to pragmatic and innovative designs for large-scale distributed systems